Detecting Uncertainty Cues in Hungarian Social Media Texts
نویسنده
چکیده
In this paper, we aim at identifying uncertainty cues in Hungarian social media texts. We present our machine learning based uncertainty detector which is based on a rich features set including lexical, morphological, syntactic, semantic and discourse-based features, and we evaluate our system on a small set of manually annotated social media texts. We also carry out cross-domain and domain adaptation experiments using an annotated corpus of standard Hungarian texts and show that domain differences significantly affect machine learning. Furthermore, we argue that differences among uncertainty cue types may also affect the efficiency of uncertainty detection.
منابع مشابه
Uncertainty Detection in Hungarian Texts
Uncertainty detection is essential for many NLP applications. For instance, in information retrieval, it is of primary importance to distinguish among factual, negated and uncertain information. Current research on uncertainty detection has mostly focused on the English language, in contrast, here we present the first machine learning algorithm that aims at identifying linguistic markers of unc...
متن کاملAnnotating Uncertainty in Hungarian Webtext
Uncertainty detection has been a popular topic in natural language processing, which manifested in the creation of several corpora for English. Here we show how the annotation guidelines originally developed for English standard texts can be adapted to Hungarian webtext. We annotated a small corpus of Facebook posts for uncertainty phenomena and we illustrate the main characteristics of such te...
متن کاملNormalisation and Analysis of Social Media Texts
We present a language-independent method for automatic diacritic restoration. The method focuses on low computational resource usage, making it suitable for mobile devices. We train a decision tree classifier on character-based features without involving a dictionary. Since our features require at most a few characters of context, this approach can be applied to very short text segments such as...
متن کاملAn Empirical Study on Uncertainty Identification in Social Media Context
Uncertainty text detection is important to many social-media-based applications since more and more users utilize social media platforms (e.g., Twitter, Facebook, etc.) as information source to produce or derive interpretations based on them. However, existing uncertainty cues are ineffective in social media context because of its specific characteristics. In this paper, we propose a variant of...
متن کاملReferential Cohesion in Hungarian: A Developmental Study
Discursive functions are shared across all languages, but each language uses different linguistic means to appropriately establish referential cohesion. Children’s mastery of this cohesion in narrative texts develops gradually and is influenced by development in syntax. Consequently, speakers can employ different strategies, and among the various structural configurations of arguments, some are...
متن کامل